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Abstract. - In this Letter, based on the user-tag-object tripartite graphs, we propose a recom- 
mendation algorithm that makes use of social tags. Besides its low cost of computational time, 
the experimental results on two real- world data sets, Del.icio.us and MovieLens, show that it can 
enhance the algorithmic accuracy and diversity. Especially, it provides more personalized recom- 
mendation when the assigned tags belong to diverse topics. The proposed algorithm is particularly 
effective for small-degree objects, which reminds us of the well-known cold-start problem in rec- 
ommender systems. Further empirical study shows that the proposed algorithm can significantly 
solve this problem in social tagging systems with heterogeneous object degree distributions. 



Introduction. — Many complex systems can be well 
described by networks where nodes represent individuals, 
and edges denote the relations among them [TH5]. Re- 
oently, the personalized recommendation in complex net- 
works has attracted increasing attention from physicists 
[BHlOj. Personalized recommendation aims at finding ob- 
jects (e.g. books, webpages, music, etc.) that are most 
likely to be collected by users. For example, classical in- 
formation retrieval can be viewed as recommending docu- 
ments with given words [11] , and the process of link pre- 
diction can be considered as a recommendation problem in 
unipartite networks jT^HH]- The central problem of per- 
sonalized recommendation can be divided into two parts: 
one is the estimation of similarity based on the historical 
records of user activities [15j[T6]; the other is the usage 
of accessorial information (e.g., object attributes) to effi- 
ciently filter out irrelevant objects. For the formal task, 
since computing and storing the similarities of all user 
pairs is costly, we usually consider only the top-fc most 
similar users [17] . For the latter task, very accurate de- 
scriptions of objects may be helpful in filtering irrelevant 
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objects, however, it is limited to the attribute vocabulary, 
and, on the other hand, attributes describe global proper- 
ties of objects which are less helpful to generate personal- 
ized recommendations. 

Recently, the advent of Wcb2.0 and its affiliated applica- 
tions bring a new form of paradigm, social tagging systems 
(or called collaborative tagging systems), which introduces 
a novel platform for users' participation. A social tag- 
ging system allows users to freely assign tags to annotate 
their collections, requires no specific skills for users to par- 
ticipate in, broadens the semantic relations among users 
and objects, and thus has attracted much attention from 
the scientific community. Colder et al. studied its usage 
patterns and classified seven kinds of tag functions [TB] . 
Similar to the tagging functions, the keywords and PACS 
numbers are analyzed to better characterize the structure 
of co-authorship and citation networks [Tnj[20] - Further- 
more, many efforts have been done to explain the emer- 
gent properties of social tagging systems. Cattuto et al. 
[2T] proposed a memory-based Yule-Simon model to de- 
scribe the aging effects and occurrence frequencies of tags. 
Zhang and Liu [52] proposed an evolutionary hypergraph 
model, where users not only assign tags to objects but also 
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Fig. 1: (Color online) Object-degree-dependent ranking score 
for the three algorithms in Del.icio.us and MovieLens. Each 
data point is obtained by averaging over 50 realizations, each 
of which corresponds to an independent division of training set 
and testing set. 
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Fig. 2: Object degree distributions of the two data sets. The 
insets show the accumulative distributions 



retrieve objects via tags. 

Besides, social tagging systems have already found wide 
applications in Recommender Systems. By considering the 
tag frequency as weight, Szomszor et al. |23j proposed an 
improved movie recommendation algorithm. Schenkel et 
al. [21] proposed an incremental threshold algorithm tak- 
ing into account both the social ties among users and se- 
mantic relations of different tags, which performs remark- 
ably better than the algorithm without tag expansion. 
Zhang et al. [25] and Shang et al. |26| proposed an object- 
based and user-based hybrid tag algorithm, respectively, 
harnessing diffusion-based methods to obtain better rec- 
ommendations. Shang and Zhang [27] considered the tag 
usage frequency as edge weight in a user-object bipartite 
network and improved the accuracy of recommendation. 

In this Letter, we propose a diffusion-based recommen- 
dation algorithm which considers social tags as a bridge 
connecting users and objects. That is to say, users can 
efficiently find the target objects via tags. In particular, 
we consider the usage frequencies of tags as users' per- 
sonal preference, while the semantic relations between tags 
and objects as global information. Experimental results 
show that the present algorithm can significantly improve 
the recommendation accuracy. Further empirical study 
shows that the proposed algorithm is especially effective 
for the objects collected by few users, which reminds us 
of the well-known cold-start problem [25JI2H] . Since there 
is little information available for new objects, social tags 
can effectively build up relations between existing objects 
and the new ones. Therefore, the incorporating of tags 
can remarkably help users find the new (or less popu- 
lar) yet interesting objects, and thus enhance the over- 
all accuracy. In addition, we employ entropy-based and 
Hamming-distance-based methods to measure the inner- 
ami inter- diversity of tag usage patterns, respectively. 
Experimental results show that there are different tag us- 
age patterns in the two datasets: users assign more di- 
verse tags in Del.icio.us than MovieLens, and it might 
shed lights on understanding why the proposed algorithm 
can enhance the recommendation diversity in Del.icio.us 
largely than MovieLens. 

Data. — The empirical data used in this paper include: 
(i) Del.icio.us -one of the most popular social bookmark- 
ing web sites, which allows users not only to store and or- 
ganize personal bookmarks (URLs), but also to look into 
other users' collections and find what they might be inter- 
ested in by simply keeping track of the baskets with social 
tags; (ii) MovieLens -a movie rating system, where each 
user votes movies in five discrete ratings 1-5. A tagging 
function is added in from January 2006. In both data sets, 
we remove the isolated nodes and guarantee that each user 
has collected at least one object, each object has been col- 
lected by at least two users, assigned by at least two tags, 
and each tag is used by at least by two users, and each 
tag is used at least twice by every adjacent user. Table 1 
summarizes the basic statistics of the purified data sets. 
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Fig. 3: (Color online) (InterD) as a function of the length of 
recommendation list for the three algorithms in Del.icio.ua and 
MovieLens. 



Fig. 4: (Color online) (InnerD) as a function of the length 
of recommendation list for the three algorithms in Del.icio.us 
and MovieLens. 



Every data set is consisted of many entries, and each 
follows the form F={user, object, tagi, tag2, tag t }, 
where t is the number of tags assigned to this object by 
this user. Then each data set is randomly divided into two 
parts: the training set, is treated as known information, 
while the testing set is used for testing. In this Letter, 
the training set always contains 90% of entries and the 
remaining 10% of entries constitute the testing set. 

Algorithms. — A rccommender system considered in 
this Letter consists of three sets, respectively of users U = 
{Ui,U 2 ,- ■ ■ ,U n }, objects O = {Oi,0 2 ,- ■ ■ ,O rn }, and tags T 
= {Ti,T2,- • • ,T r }. The tripartite graph representation can 
be described by three matrices, A, A' and A" for user- 
object, object-tag and user-tag relations. If Ui has col- 
lected Oj, we set dij = 1, otherwise a%j = 0. Analogously, 
we set a'j k = 1 if Oj has been assigned by the tag T^, and 
a 'jk = otherwise. Furthermore, the users' preferences on 
tags can be represented by a weighted matrix A" , where 
a" k is the number of times that Ui has adopted T^. 

Subsequently, we introduce the proposed algorithm, as 
well as two baseline ones: (I) user-object diffusion [5]; (II) 
user-object-tag diffusion [5S|; (III) user-tag-object diffu- 
sion. Given a target user Ui, the above three algorithms 
will generate final score of each object, fj, that are pushed 
into recommendation resource for him/her, are described 
as following: 

(I) Supposing that a kind of resource is initially located 



on objects. Each object averagely distributes its resource 
to all neighboring users, and then each user redistributes 
the received resource to all his/her collected objects. The 
final resource vector for the target user Ui, f, after the 
two-step diffusion is: 

n rn 

where k(Ui) = Y^Jj=i a ij 1S ^ ne number of collected ob- 
jects for user Ui, and k(O s ) = X)"=i a is IS t ne number of 
neighboring users for object O s . 

(II) The initial resources are set as same as I, but each 
object equally distributes its resource to all neighboring 
tags, and then each tag redistributes the received resource 
to all its neighboring objects. Thus, the final resource 
vector, /', is: 

r m ii 
el V^V^ a jl a ls a is . , „ 

/, = EE^y^^,., m , (2) 

where k'(Ti) = J^jli a 'ji ' 1S the number of neighboring 
objects for tag I], k'(O s ) = Ym=i a 'si 1S the number of 
neighboring tags for object O s . 

(III) Different from I and II, here, the initial resources 
are located on tags according to their frequencies used by 
the target user Ui. Then each tag distributes the initial 
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Table 1: Basic statistics of the two data sets, n, m, r are the 
total numbers of users, objects and tags, respectively, (k), (£•') 
and (k") denote the average number of objects collected by a 
user, tags assigned by an object and tags adopted by a user 
respectively. Del. and Mov. represent the data sets Del.icio.us 
and MovieLens, respectively. 
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resource directly to all its neighboring objects. Thus, the 
final resource vector, /", reads: 

>>=tM (3) 

After we obtain the final score of objects, all the objects 
that Ui has not collected arc ranked in a descending order, 
and the top L objects will be recommended to Ui. 

Comparing with algorithms I and II, the advantages 
of algorithm III are threefold. Firstly, since social tags 
highly reflect users' personal preferences, algorithm III is 
promisingly expected to generate more personalized rec- 
ommendation. Secondly, the one-step diffusion can clearly 
save computational time especially for large-scale data. 
Thirdly, algorithm III reveals the essential role of tags: 
building a bridge between users and objects, helping users 
retrieve and organize collections without the limit of hier- 
archial structure and vocabulary of words. 

Metrics. — To give solid and comprehensive evalua- 
tion of the proposed algorithm, we employ three different 
metrics that characterizing the accuracy and diversity of 
recommendations . 

1. Ranking Score (RS) [8]. — In the present case, for 
each entry in the testing set (i.e. a user-object pair), 
RS is defined as the rank of the object, divided by the 
number of all uncollected objects for the correspond- 
ing user. Apparently, the less the RS, the higher 
accuracy the algorithm is. The average ranking score 
(RS) is given by averaging over all entries in the test- 
ing set. 

2. Inter Diversity (Inter D) [51131]. — Inter D measures 
the differences of different users' recommendation 
lists, thus can be understood as the inter-user diver- 
sity. Denote O r the set of recommended objects for 
user Ui, then 

InterD = — > 1 - ^ , (4 

i#j \ / 

where L = \O r \ is the length of recommendation list. 
In average, greater or less InterD mean respectively 
greater or less personalization of users' recommenda- 
tion lists. 
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Fig. 5: (Color online) (OR) as a function of g for the two data 
sets. The black squares represent (OR) for objects and the red 
circles are (OR) of tags, respectively. 



3. Inner Diversity (Inner D) [3T]. — Inner D measures 
the differences of objects within a user's recommen- 
dation list, thus can be considered as the inner-user 
diversity. It reads, 

2 " 

InnerD=l- E ^, (5) 

where Sn = L= , , is the cosine similarity bc- 
J Vl r Oj|xl r o,l 

tween objects Oj and Oi, where denotes the set of 

users having collected object Oj. In average, greater 

or less InnerD suggests respectively greater or less 

topic diversification of users' recommendation lists. 

Results. — To make clear the role of social tags, a 
microscopic picture of algorithmic accuracy is very help- 
ful. Especially, since social tags are used to describe the 
objects, we would like to see the dependence of accuracy 
on object degree, namely the number of users collecting it. 
Given an object degree k , the degree- dependent average 
ranking score, denoted by (RS)k a , is defined as the mean 
positions averaged over all the entries in the testing set 
with object degree equal to k Q . 

In Table 2 and Table 3, we give the overall (RS) of the 
three algorithms for the observed data sets. It indicates 
that the (RS) is significantly enhanced by the present al- 
gorithm. Fig. [T] reports the correlation between accuracy 
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Fig. 6: (Color online) (a) (E) as a function of user degree; (b) 
(E) as a function of object degree, respectively. 



and object degree. The ranking score decays with the in- 
creasing k a for all the three algorithms. In addition, the 
three curves intersect around k o ~l0, which is a relatively 
small value considering the heterogeneous object-degree 
distribution shown in Fig. [2j From Fig. [TJ it is seen 
that the algorithmic accuracy of algorithm III is better 
than that of algorithms I or II for k a <10, but worse when 
k a >10 (sec also Table 2 and Table 3), which reminds 
us of the well-known cold-start problem in rccommender 
systems: how to recommend the unpopular and/or new 
objects to users? It is very difficult for a user to be aware 
of these cold objects by random surfing since they are not 
hot items, and for a recommender system to recommend 
them to right places since there are usually insufficient 
information about them. In fact, there are 90.04% and 
69.35% objects with k a <10 in Del.icio.us and MovieLens, 
respectively. Therefore, a successful recommender system 
has to make reasonable recommendations of cold objects. 
Comparing with the algorithms I and II, the present one 
can effectively help users find those cold objects via social 
tags. 

Fig. [3] and Fig. 0] show the experimental results of 
(Inter D) and (Inner D) , respectively. In Fig. [3j (Inter D) 
is enhanced only for Del.icio.us. The reason for small 
(Inter D) of algorithm III in MovieLens is that there are 
only movies in that data set, and thus a comparatively 
small number of tags are used with huge overlapping. The 
overlapping ratio, OR, of tags for users to assign to the 



Table 2: Algorithmic accuracy for Del.icio.us. (RS)k a <io is 
the average ranking score over objects with degree equal or 
less than 10, and (RS) k >io is the average ranking scores over 
objects with degree greater than 10. Each value is obtained by 
averaging over 50 realizations, each of which corresponds to an 
independent division of training set and testing set. 



Algorithms 


(RS) 


\RS)ka<10 


(RS)k a >io 
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0.276 


0.369 


0.054 


II 


0.209 


0.275 


0.049 


III 


0.196 


0.249 


0.068 



same objects, is defined as: 

0R v = W ^ OR(i,j), (6) 

where N g is the number of user pairs (i,j) such that i =/= j, 
and G(i,j) = g denotes the number of common objects 
collected by users i and j. OR(i,j) is defined as the total 
number of tag agreements on the same objects for user 
pair (i,j). Similar definition can also be used to quantity 
the overlapping ratio of objects collected by users with 
the same tags. Clearly, larger OR indicates smaller diver- 
sity, and vice versa. Fig. [5] shows the correlation between 
(OR) g and g. One can see that (OR) g of tags is smaller 
than that of objects in Del.icio.us, while it is not the case 
for MovieLens. In a word, social tags can help generate 
more diverse recommendation only if the tags are them- 
selves used in a diverse way. 

Fig. U shows that (InnerD) is generally improved by 
our proposed algorithm, indicating that it can help users 
broaden their horizons. Except for MovieLens with very 
small L. It is again resulted from the narrow choice of tags 
in MovieLens. Recently, the Shannon entropy is widely 
used to quantify network diversity in social sharing net- 
works [32] and social economics [33] • In the Letter, we also 
employ it to measure individual usage pattern of tags: 

E{U i ) = -Y d VvMPi;t), (7) 
t 

where Pi-t is the probability for tag t used by user f/j. Then 
the dependence of entropy on user degree, Ek , is given by 
averaging all the E (Ui) with k (Ui) = k. Similar definition 
can be used to quantify the dependence of entropy for 
objects. Clearly, Larger Ek means that the users are more 
willing to use diverse topics of tags, or the objects are more 
likely to be assigned to more diverse tags, and vice versa. 
Fig. [S] shows that E of Del.icio.us are greater than that 
of MovieLens for both users and objects, indicating that 
Del.icio.us is a more diverse system than MovieLens, and 
further giving a reasonable explanation why algorithm III 
can obtain better InnerD in Del.icio.us than MovieLens. 

Conclusions and Discussion. — In this Letter, we 
proposed a recommendation algorithm making use of so- 
cial tags. This algorithm, considers the frequencies of tags 
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Table 3: Algorithmic accuracy for MovieLens. 
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0.307 


0.039 
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0.130 


0.168 


0.055 


III 


0.123 


0.146 


0.070 



as user preferences on different topics and tag-object links 
as semantical relations between them. Experimental re- 
sults demonstrated that the proposed algorithm outper- 
forms the two baseline algorithms in both accuracy and 
diversity. The present algorithm outperforms others espe- 
cially for the objects with small degrees (k a < 10), which 
constitute the majority of objects. Therefore, the incor- 
porating of social tags could be, to some extent, helpful in 
solving the cold-start problem of recommcndcr systems. 

Recently, besides the accuracy, the significance of di- 
versity has attracted more and more attention in infor- 
mation filtering jlOj . Experimental results in this Letter 
demonstrated that a wide-range adoption of social tags 
can enhance the diversity of recommendation. Therefore, 
we strongly encourage recommender systems to add tag- 
ging functions and users to organize their collections by 
using tags. However, despite the significant role of tags, 
the polysemy and synonymy problems |18j might result 
in coarse and inaccurate performance, the tag clustering 
technique [31] is hopefully to provide a promising way to 
generate multi-scale recommendations and eventually ob- 
tain the best performance. 
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